Search CORE

169 research outputs found

ALGORITHMS AND HIGH PERFORMANCE COMPUTING APPROACHES FOR SEQUENCING-BASED COMPARATIVE GENOMICS

Author: Langmead Benjamin Thomas
Publication venue
Publication date: 01/01/2012
Field of study

As cost and throughput of second-generation sequencers continue to improve, even modestly resourced research laboratories can now perform DNA sequencing experiments that generate hundreds of billions of nucleotides of data, enough to cover the human genome dozens of times over, in about a week for a few thousand dollars. Such data are now being generated rapidly by research groups across the world, and large-scale analyses of these data appear often in high-profile publications such as Nature, Science, and The New England Journal of Medicine. But with these advances comes a serious problem: growth in per-sequencer throughput (currently about 4x per year) is drastically outpacing growth in computer speed (about 2x every 2 years). As the throughput gap widens over time, sequence analysis software is becoming a performance bottleneck, and the costs associated with building and maintaining the needed computing resources is burdensome for research laboratories. This thesis proposes two methods and describes four open source software tools that help to address these issues using novel algorithms and high-performance computing techniques. The proposed approaches build primarily on two insights. First, that the Burrows-Wheeler Transform and the FM Index, previously used for data compression and exact string matching, can be extended to facilitate fast and memory-efficient alignment of DNA sequences to long reference genomes such as the human genome. Second, that these algorithmic advances can be combined with MapReduce and cloud computing to solve comparative genomics problems in a manner that is scalable, fault tolerant, and usable even by small research groups

Digital Repository at the University of Maryland

BSmooth: from whole genome bisulfite sequencing reads to differentially methylated regions

Author: Benjamin Langmead
Kasper D Hansen
Rafael A Irizarry
Publication venue: Springer Nature
Publication date: 01/01/2012
Field of study

DNA methylation is an important epigenetic modification involved in gene regulation, which can now be measured using whole-genome bisulfite sequencing. However, cost, complexity of the data, and lack of comprehensive analytical tools are major challenges that keep this technology from becoming widely applied. Here we present BSmooth, an alignment, quality control and analysis pipeline that provides accurate and precise results even with low coverage data, appropriately handling biological replicates. BSmooth is open source software, and can be downloaded from http://rafalab.jhsph.edu/bsmooth

Springer - Publisher Connector

PubMed Central

Highly Scalable Short Read Alignment with the Burrows-Wheeler Transform and Cloud Computing

Author: Langmead Benjamin Thomas
Publication venue
Publication date: 01/01/2009
Field of study

Improvements in DNA sequencing have both broadened its utility and dramatically increased the size of sequencing datasets. Sequencing instruments are now used regularly as sources of high-resolution evidence for genotyping, methylation profiling, DNA-protein interaction mapping, and characterizing gene expression in the human genome and in other species. With existing methods, the computational cost of aligning short reads from the Illumina instrument to a mammalian genome can be very large: on the order of many CPU months for one human genotyping project. This thesis presents a novel application of the Burrows-Wheeler Transform that enables the alignment of short DNA sequences to mammalian genomes at a rate much faster than existing hashtable-based methods. The thesis also presents an extension of the technique that exploits the scalability of Cloud Computing to perform the equivalent of one human genotyping project in hours

Digital Repository at the University of Maryland

B-SOLANA: an approach for the analysis of two-base encoding bisulfite sequencing data

Author: Andre Franke
Benjamin Kreck
Bock
Bormann
Felix Krueger
George Marnellos
Hansen
Holliday
Julia Richter
Langmead
Li
Lister
Ondov
Pedersen
Pelizzola
Reiner Siebert
Publication venue: Oxford University Press
Publication date
Field of study

Summary: Bisulfite sequencing, a combination of bisulfite treatment and high-throughput sequencing, has proved to be a valuable method for measuring DNA methylation at single base resolution. Here, we present B-SOLANA, an approach for the analysis of two-base encoding (colorspace) bisulfite sequencing data on the SOLiD platform of Life Technologies. It includes the alignment of bisulfite sequences and the determination of methylation levels in CpG as well as non-CpG sequence contexts. B-SOLANA enables a fast and accurate analysis of large raw sequence datasets

Crossref

PubMed Central

Computational pan-genomics: status, promises and challenges

Author: Abeel Thomas
Alkan Can
Baaijens Jasmijn
Bakker Paul
Boeva Valentina
Bonnal Raoul
Chiaromonte Francesca
Chikhi Rayan
Ciccarelli Francesca
Cijvat Robin
Datema Erwin
Dijkstra Louis
Duijn Cornelia
Dutilh Bas
Eichler Evan
El-Kebir Mohammed
Ernst Corinna
Eskin Eleazar
Garrison Erik
Ghaffaari Ali
Guryev Victor
Kersey Paul
Klau Gunnar
Kloosterman Wigard
Korbel Jan
Lameijer Eric-Wubbo
Langmead Benjamin
Marschall Tobias
Martin Marcel
Marz Manja
Medvedev Paul
Mu John
Mäkinen Veli
Neerincx Pieter
Novak Adam
Ouwens Klaasjan
Paten Benedict
Peterlongo Pierre
Pisanti Nadia
Porubsky David
Rahmann Sven
Raphael Benjamin
Reinert Knut
Ridder Dick
Ridder Jeroen
Rivals Eric
Sanders Ashley
Schlesner Matthias
Schulz-Trieglaff Ole
Schönhuth Alexander
Sheikhizadeh Siavash
Shneider Carl
Smit Sandra
The Computational Pan-Genomics Consortium
Valenzuela Daniel
Vandin Fabio
Wang Jiayin
Wessels Lodewyk
Ye Kai
Zhang Ying
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2018
Field of study

International audienceMany disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different computational methods and paradigms are needed. We will witness the rapid extension of computational pan-genomics, a new sub-area of research in computational biology. In this article, we generalize existing definitions and understand a pan-genome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to computational pan-genomics can help address many of the problems currently faced in various domains

INRIA a CCSD electronic archive server

Archivio della Ricerca - Università di Pisa

EUR Research Repository

HAL-MINES ParisTech

Archivio della ricerca della Scuola Superiore Sant'Anna

Radboud Repository

HAL-Rennes 1

Virulence Regulator EspR of Mycobacterium tuberculosis Is a Nucleoid-Associated Protein

Author: A Garces
AM Abdallah
B Blasco
B Langmead
Benjamin Blasco
BR Gordon
C Sala
C Sala
Claudia Sala
D Skoko
DF Browning
Eric J. Rubin
F Forti
Florence Pojer
G Rey
IC Werlang
J Becq
J Gonzalo-Asensio
Jacques Rougemont
Jeffrey M. Chen
JM Tufariello
JS Cox
L Richter
L Rickman
L Shapiro
LJ Zhu
LR Camacho
N van der Wel
OS Rosenberg
P Brodin
PA Fujita
R Edgar
R Simeone
RC Hartkoorn
RT Dame
Ruben Hartkoorn
S Homolka
S Raghavan
SB Walters
SC Dillon
SM Fortune
ST Cole
Stewart T. Cole
SV Gordon
Swapna Uplekar
T Hsu
TA Azam
TL Bailey
W Bitter
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

The principal virulence determinant of Mycobacterium tuberculosis (Mtb), the ESX-1 protein secretion system, is positively controlled at the transcriptional level by EspR. Depletion of EspR reportedly affects a small number of genes, both positively or negatively, including a key ESX-1 component, the espACD operon. EspR is also thought to be an ESX-1 substrate. Using EspR-specific antibodies in ChIP-Seq experiments (chromatin immunoprecipitation followed by ultra-high throughput DNA sequencing) we show that EspR binds to at least 165 loci on the Mtb genome. Included in the EspR regulon are genes encoding not only EspA, but also EspR itself, the ESX-2 and ESX-5 systems, a host of diverse cell wall functions, such as production of the complex lipid PDIM (phenolthiocerol dimycocerosate) and the PE/PPE cell-surface proteins. EspR binding sites are not restricted to promoter regions and can be clustered. This suggests that rather than functioning as a classical regulatory protein EspR acts globally as a nucleoid-associated protein capable of long-range interactions consistent with a recently established structural model. EspR expression was shown to be growth phase-dependent, peaking in the stationary phase. Overexpression in Mtb strain H37Rv revealed that EspR influences target gene expression both positively or negatively leading to growth arrest. At no stage was EspR secreted into the culture filtrate. Thus, rather than serving as a specific activator of a virulence locus, EspR is a novel nucleoid-associated protein, with both architectural and regulatory roles, that impacts cell wall functions and pathogenesis through multiple genes

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

Tuning transcription factor availability through acetylation-mediated genomic redistribution

Author: Aksan
Alessia Loffreda
Alexander Schepsky
Benjamin Schuster-Böckler
Benjamin Thomas
Bertolotto
Biggin
Blackwell
Boyes
Brewster
Bricambert
Carreira
Carreira
Ceseña
Cheli
Cheli
Chen
Chen
Colin R. Goding
Cui
Daitoku
Davide Mazza
Dobin
Dugo
E. Elizabeth Patton
Eda Suer
Eiríkur Steingrímsson
Elf
Fisher
Fisher
Fisher
Fock
Fogh
Garraway
Gebhardt
Giandomenico
Giard
Giuliano
Goding
Goding
Goodall
Grimm
Gu
Hans Friedrichsen
Hansen
Heinz
Hejna
Hodgkinson
Hoek
Hoek
Hosokawa
Irwin Davidson
Izeddin
Jean-Philippe Lambert
Ji
Kalderon
Konieczkowski
Langmead
Laurette
Lickwar
Lister
Liu
Loffreda
Louphrasitthiphol
Lowings
Machanick
Magnúsdóttir
Malcov-Brog
Mark Middleton
Matthias Wilmanns
Mazza
Mazza
McGill
McGill
McNally
Michelman-Ribeiro
Min Lu
Mishra
Morisaki
Möller
Müller
Ngeow
Pakavarin Louphrasitthiphol
Panagis Filippakopoulos
Perrot
Ploper
Pogenberg
Ponugoti
Price
Qiu
Quinlan
Ramírez
Rasmus Freter
Richard Lisle
Robert Siddaway
Rodolosse
Saldanha
Sato
Schepsky
Segal
Solomon
Speil
Sprague
Stavreva
Stenoien
Strub
Sugo
Teves
Thomas Strub
Thurber
Tokunaga
Tsujimura
van Royen
Verfaillie
Vivian Pogenberg
von Hippel
Wang
Wang
Webster
Westerfield
Widlund
Xin Lu
Ye
Zakut
Zhang
Zhang
Zhiqiang Zeng
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

It is widely assumed that decreasing transcription factor DNA-binding affinity reduces transcription initiation by diminishing occupancy of sequence-specific regulatory elements. However, in vivo transcription factors find their binding sites while confronted with a large excess of low-affinity degenerate motifs. Here, using the melanoma lineage survival oncogene MITF as a model, we show that low-affinity binding sites act as a competitive reservoir in vivo from which transcription factors are released by mitogen-activated protein kinase (MAPK)-stimulated acetylation to promote increased occupancy of their regulatory elements. Consequently, a low-DNA-binding-affinity acetylation-mimetic MITF mutation supports melanocyte development and drives tumorigenesis, whereas a high-affinity non-acetylatable mutant does not. The results reveal a paradoxical acetylation-mediated molecular clutch that tunes transcription factor availability via genome-wide redistribution and couples BRAF to tumorigenesis. Our results further suggest that p300/CREB-binding protein-mediated transcription factor acetylation may represent a common mechanism to control transcription factor availability

Crossref

Opin visindi

Edinburgh Research Explorer

Oxford University Research Archive

Mechanisms of stretch-mediated skin expansion at single-cell resolution.

Author: A Mamidi
A Scialdone
A Sánchez-Danés
AB Tepole
Alejandro Sifrim
AM Zöllner
AS Hopkin
AT Lun
B Langmead
Benjamin D. Simons
Benjamin Swedlund
C Blanpain
C Luxenburg
Christine Dubois
CY McLean
Cédric Blanpain
D Damiani
DJ McCarthy
E Becht
E Clayton
E Gonzalez-Roca
E Moreno
E Rognoni
EA Susaki
Fadel Tissir
G Mascré
G Posern
Gaëlle Lapouge
H Hirata
H Li
HJ Snippert
HQ Le
JA Segre
Jens Van Herck
JT Connelly
K Rottner
K Street
K Yang
Katlijn Vints
KH Vining
KR Mesa
L LeGoff
M Aragona
M Aragona
M Duda
M Xin
Mariaceleste Aragona
MC Obdeijn
MD Muzumdar
Milan Malfait
MM Nava
P Rompolas
Pieter Baatsen
R Eferl
RG Lopez
RJ Whitson
S Aibar
S Anders
S Heinz
S Joost
S Joost
S Yonemura
SA Wickström
Seungmin Han
SJ Ellis
Sophie Dekoninck
Souhir Gargouri
T Iskratsch
T Panciera
T Stuart
Thierry Voet
TJ Kirby
V Vasioukhin
V Vasioukhin
VA Botchkarev
X Lim
Y Zhang
YA Miroshnikova
Yura Song
Publication venue: Nature
Publication date: 01/01/2020
Field of study

The ability of the skin to grow in response to stretching has been exploited in reconstructive surgery1. Although the response of epidermal cells to stretching has been studied in vitro2,3, it remains unclear how mechanical forces affect their behaviour in vivo. Here we develop a mouse model in which the consequences of stretching on skin epidermis can be studied at single-cell resolution. Using a multidisciplinary approach that combines clonal analysis with quantitative modelling and single-cell RNA sequencing, we show that stretching induces skin expansion by creating a transient bias in the renewal activity of epidermal stem cells, while a second subpopulation of basal progenitors remains committed to differentiation. Transcriptional and chromatin profiling identifies how cell states and gene-regulatory networks are modulated by stretching. Using pharmacological inhibitors and mouse mutants, we define the step-by-step mechanisms that control stretch-mediated tissue expansion at single-cell resolution in vivo.Wellcome Trust Royal Societ

A Computational and Experimental Study of the Regulatory Mechanisms of the Complement System

Author: A Krarup
A Thern
AA Korotaevskiy
AF Zadura
AM Blom
AM Blom
AM Blom
Anna M. Blom
AP Sjoberg
AP Sjoberg
B Dahlback
B Liu
B Selander
Benjamin Leong
Bing Liu
Bow Ho
C Mold
CJ Langmead
D Ricklin
D Sheskin
David Hsu
DH Anderson
DT Fearon
DT Fearon
G Koh
H Hirayama
I Gigli
I Gigli
J Gunawardena
J Scharfstein
J Zhang
Jeak Ling Ding
JH Griffin
Jing Zhang
K Berggard
KP Murphy
KP Murphy
L Marnell
L Truedsson
LM Boerger
M Bozga
M Jaeger
M Matsushita
M Okroj
MH Kaplan
MJ Walport
MJ Walport
MV Suresh
N Rawal
NA van Riel
NV Prasadarao
P. S. Thiagarajan
Pei Yi Tan
PM Ng
R Veerhuis
Rustom Antia
S Hoops
S Jha
S Ram
SR Barnum
Sunil Sethi
T Fujita
T Meri
T Nordstrom
V Kirjavainen
X Ji
Z Fishelson
Z Zi
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/06/2010
Field of study

The complement system is key to innate immunity and its activation is necessary for the clearance of bacteria and apoptotic cells. However, insufficient or excessive complement activation will lead to immune-related diseases. It is so far unknown how the complement activity is up- or down- regulated and what the associated pathophysiological mechanisms are. To quantitatively understand the modulatory mechanisms of the complement system, we built a computational model involving the enhancement and suppression mechanisms that regulate complement activity. Our model consists of a large system of Ordinary Differential Equations (ODEs) accompanied by a dynamic Bayesian network as a probabilistic approximation of the ODE dynamics. Applying Bayesian inference techniques, this approximation was used to perform parameter estimation and sensitivity analysis. Our combined computational and experimental study showed that the antimicrobial response is sensitive to changes in pH and calcium levels, which determines the strength of the crosstalk between CRP and L-ficolin. Our study also revealed differential regulatory effects of C4BP. While C4BP delays but does not decrease the classical complement activation, it attenuates but does not significantly delay the lectin pathway activation. We also found that the major inhibitory role of C4BP is to facilitate the decay of C3 convertase. In summary, the present work elucidates the regulatory mechanisms of the complement system and demonstrates how the bio-pathway machinery maintains the balance between activation and inhibition. The insights we have gained could contribute to the development of therapies targeting the complement system.Singapore. Ministry of Education (Grant T208B3109)Singapore. Agency for Science, Technology and Research (BMRC 08/1/21/19/574)Singapore-MIT Alliance (Computational and Systems Biology Flagship Project)Swedish Research Counci

Public Library of Science (PLOS)

CiteSeerX

DSpace@MIT

Lund University Publications

Crossref

Directory of Open Access Journals

PubMed Central

ScholarBank@NUS